Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Jul 17, 2025

A STATS command following a reduce operation, such as another Stats, Limit, TopN, Inline Stats, or Fork, is executed on the coordinator and can be performed in a single phase. This is useful for time-series queries where we execute two aggregations: first per time-series, then across time-series.

@dnhatn dnhatn force-pushed the single-phase-aggs branch 2 times, most recently from d342799 to 48274a9 Compare July 18, 2025 00:29
@dnhatn dnhatn closed this Sep 30, 2025
@dnhatn dnhatn deleted the single-phase-aggs branch September 30, 2025 16:39
@dnhatn dnhatn restored the single-phase-aggs branch September 30, 2025 16:39
@dnhatn dnhatn reopened this Sep 30, 2025
@dnhatn dnhatn removed the v9.2.0 label Sep 30, 2025
@dnhatn dnhatn force-pushed the single-phase-aggs branch 4 times, most recently from 264d405 to 21e3afc Compare September 30, 2025 18:03
@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

@dnhatn dnhatn marked this pull request as ready for review September 30, 2025 19:58
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

rate_bytes_in:double | time_bucket:datetime
null | 2024-05-10T00:01:00.000Z
28.80297619047619 | 2024-05-10T00:15:00.000Z
28.802976190476194 | 2024-05-10T00:15:00.000Z
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side effect of running a single-phase aggregation for avg

@elastic elastic deleted a comment Oct 1, 2025
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, cool stuff!

A little bit of a drive-by, but I noticed that there is opportunity for simplification, please have a look at my comment.

That said, I see that you requested review from @fang-xing-esql and @ivancea - folks, it'd be great if (one of) you could remain main reviewer, as I won't be able to deeply review this atm. Thank you :)

* For example, in FROM .. | STATS first | STATS second, the STATS second aggregation
* can be executed in a single phase on the coordinator instead of two phases.
*/
public class SinglePhaseAggregate extends PhysicalOptimizerRules.OptimizerRule<AggregateExec> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work, but it's actively working against code in the mapper that already distinguishes between "happens on the coordinator, only" vs. "happens partially on data nodes", see here.

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me! The new rule looks sensible in any case. But as I understand it, what Alex comments would be the ideal way: Not splitting in phases to begin with, and splitting aggs only when needed.

Maybe this is ok too, as the rule can be easily removed later. But I would investigate the other way. I believe it will be quite more complex tho...

Edit: Btw, this was 3 months old? Did something change since then that enables this PR again, or was it just "forgotten" there?

@dnhatn
Copy link
Member Author

dnhatn commented Oct 1, 2025

@alex-spies @ivancea I started with Mapper, but that class is quite fragile, so I added a separate rule instead. I will try to get this in for time-series soon, and then follow up by replacing the rule with changes in Mapper. Thanks for reviewing.

@dnhatn dnhatn merged commit 57e3a3f into elastic:main Oct 1, 2025
34 checks passed
@dnhatn dnhatn deleted the single-phase-aggs branch October 1, 2025 14:14
@dnhatn
Copy link
Member Author

dnhatn commented Oct 1, 2025

Btw, this was 3 months old? Did something change since then that enables this PR again, or was it just "forgotten" there?

I started this three months ago for time-series, but didn't spend much time on it since the difference was smaller. Now, the impact is more noticeable for low-latency queries and queries with large final results.

elasticsearchmachine pushed a commit that referenced this pull request Oct 1, 2025
Follow-up to #131485.

Interestingly, the physical plan optimizer tests seem to pick up more
cases when we map to a SINGLE agg directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants